perm filename COMMON[E80,JMC]2 blob
sn#544045 filedate 1980-10-17 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00010 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 .require "memo.pub[let,jmc]" source file
C00009 00003 Before describing the %2advice taker%1 in any detail, I would like
C00015 00004 Of the 5 points mentioned above, our work concentrates mainly on
C00021 00005 3.\There is an %2immediate deduction routine%1 which when given a set of
C00031 00006 The above proposed reasoning raises two major questions of
C00037 00007 We shall not try to follow the solution further except to remark
C00044 00008 The decisive question how a machine, even assuming that it will
C00048 00009 DR. MCCARTHY (in reply): Prof. Bar-Hillel has correctly observed that my
C00051 00010
C00054 ENDMK
C⊗;
.require "memo.pub[let,jmc]" source file;
.macro bf; ⊂
.nofill;
.turn on "\"
. ⊃
.
.macro bi; ⊂
.indent 8,12,0;
.turn on "\"; tabs 13,17,21;
. ⊃
.
.begin center
%3PROGRAMS WITH COMMON SENSE%1
by
John McCarthy
Summary
.end
Interesting work is being done in programming computers to solve problems
which require a high degree of intelligence in humans. However, certain
elementary verbal reasoning processes so simple that they can be carried
out by any non-feeble minded human have yet to be simulated by machine programs.
This paper will discuss programs to manipulate in a suitable formal language
(most likely a part of the predicate calculus) common instrumental statements.
The basic program will draw immediate conclusions from a list of premises.
These conclusions will be either declarative or imperative sentences. When an
imperative sentence is deduced the program takes a corresponding action. These
actions may include printing sentences, moving sentences on lists, and
reinitiating the basic deduction process on these lists.
Facilities will be provided for communication with humans in the system via
manual intervention and display devices connected to the computer.
%7This paper is reprinted from
%3McCarthy, John (1960)%1: "Programs with Common Sense," Proceedings of the
Teddington Conference on the Mechanization of Thought Processes, Her Majesty's
Stationery Office, London.
The conference was held at the National Physical Laboratory, Teddington,
England in December 1959.
The computer file is COMMON[E80,JMC] at SU-AI.%1
.skip to column 1
The %2advice taker%1 is a proposed program for solving problems by manipulating
sentences in formal languages. The main difference between it and other programs
or proposed programs for manipulating formal languages (the %2Logic Theory
Machine%1 of Newell, Simon and Shaw and the Geometry Program of Gelernter)
is that in the previous programs the formal system was the subject matter but the
heuristics were all embodied in the program. In this program the procedures will
be described as much as possible in the language itself and, in particular, the
heuristics are all so described.
The main advantages we expect the %2advice taker%1 to have is that
its behavior will be improvable merely by making statements to it,
telling it about its symbolic environment and what is wanted from it. To
make these statements will require little if any knowledge of the program
or the previous knowledge of the %2advice taker%1. One will be able to
assume that the %2advice taker%1 will have available to it a fairly wide
class of immediate logical consequence of anything it is told and its
previous knowledge. This property is expected to have much in common with
what makes us describe certain humans as having %2common sense%1. We
shall therefore say that %2a program has common sense if it automatically
deduces for itself a sufficiently wide class of immediate consequences of
anything it is told and what it already knows.%1
The design of this system will be a joint project with Marvin
Minsky, but Minsky is not to be held responsible for the views expressed
here.
Before describing the %2advice taker%1 in any detail, I would like
to describe more fully our motivation for proceeding in this direction.
Our ultimate objective is to make programs that learn from their
experience as effectively as humans do. It may not be realized how far we
are presently from this objective. It is not hard to make machines learn
from experience to make simple changes in their behavior of a kind which
has been anticipated by the programmer. For example, Samuel has included
in his checker program facilities for improving the weights the machine
assigns to various factors in evaluating positions. He has also included
a scheme whereby the machine remembers games it has played previously and
deviates from its previous play when it finds a position which it
previously lost. Suppose, however, that we wanted an improvement in
behavior corresponding, say, to the discovery by the machine of the
principle of the opposition in checkers. No present or presently proposed
schemes are capable of discovering phenomena as abstract as this.
If one wants a machine to be able to discover an abstraction, it seems most
likely that the machine must be able to represent this abstraction in some
relatively simple way.
There is one known way of making a machine capable of learning
arbitrary behavior; thus to anticipate every kind of behavior. This is
to make it possible for the machine to simulate arbitrary behaviors and
try them out. These behaviors may be represented either by nerve nets
(%2ref.2%1), by Turing machines (%2ref.3%1), or by calculator programs
(%2ref.4%1). The difficulty is two-fold. First, in any of these
representations the density of interesting behaviors is incredibly low.
Second, and even more important, small interesting changes in behavior
expressed at a high level of abstraction do not have simple
representations. It is as though the human genetic structure were
represented by a set of blue-prints. Then a mutation would usually result
in a wart or a failure of parts to meet, or even an ungrammatical
blue-print which could not be translated into an animal at all. It is
very difficult to see how the genetic representation scheme manages to be
general enough to represent the great variety of animals observed and yet
be such that so many interesting changes in the organism are represented
by small genetic changes. The problem of how such a representation
controls the development of a fertilized egg into a mature animal is even
more difficult.
In our opinion, a system which is to evolve intelligence of human
order should have at least the following features:
.skip
.begin bi
1.\All behaviors must be representable in the system. Therefore, the system
should either be able to construct arbitrary automata or to program in some
general purpose programming language.
2.\Interesting changes in behavior must be expressible in a simple way.
3.\All aspects of behavior except the most routine must be improvable. In
particular, the improving mechanism should be improvable.
4.\The machine must have or evolve concepts of partial success because on
difficult problems decisive successes or failures come too infrequently.
5.\The system must be able to create subroutines which can be included in
procedures as units. The learning of subroutines is complicated by the
fact that the effect of a subroutine is not usually good or bad in itself.
Therefore, the mechanism that selects subroutines should have concepts of
interesting or powerful subroutine whose application may be good under
suitable conditions.
.end
Of the 5 points mentioned above, our work concentrates mainly on
the second. We base ourselves on the idea that: %2In order for a program
to be capable of learning something it must first be capable of being told it.%1
In fact, in the early versions we shall concentrate entirely on this point and
attempt to achieve a system which can be told to make a specific improvement in
in its behavior with no more knowledge of its internal structure or previous
knowledge than is required in order to instruct a human. Once this is
achieved, we may be able to tell the %2advice taker%1 how to learn from
experience.
The main distinction between the way one programs a computer and
modifies the program and the way one instructs a human or will instruct
the %2advice taker%1 is this: A machine is instructed mainly in the form
of a sequence of imperative sentences; while a human is instructed mainly
in declarative sentences describing the situation in which action is
required together with a few imperatives that say what is wanted. We
shall list the advantages of of the two methods of instruction.
%2Advantages of Imperative Sentences%1
.Skip
.Begin bi
1.\A procedure described in imperatives is already laid out and is carried out
faster.
2.\One starts with a machine in a basic state and does not assume previous
knowledge on the part of the machine.
.end
%2Advantages of Declarative Sentences%1
.Skip
.Begin bi
1.\Advantage can be taken of previous knowledge.
2.\Declarative sentences have logical consequences and it can be arranged
that the machine will have available sufficiently simple logical
consequences of what it is told and what it previously knew.
3.\The meaning of declaratives is much less dependent on their order than is
the case with imperatives. This makes it easier to have after-thoughts.
4.\The effect of a declarative is less dependent on the previous state of the
system so that less knowledge of this state is required on the part of the
instructor.
.END
The only way we know of expressing abstractions (such as the
previous example of the opposition in checkers) is in language. That is
why we have decided to program a system which reasons verbally.
.Skip
.once center
%3The Construction of the Advice Taker%1
The %2advice taker%1 system has the following main features:
.Skip
.Begin bi
1.\There is a method of representing expressions in the computer. These
expressions are defined recursively as follows: A class of entities
called terms is defined and a term is an expression. A sequence of
expressions is an expression. These expressions are represented in the
machine by list structures (%2ref.1%1).
2.\Certain of these expressions may be regarded as declarative sentences in
a certain logical system which will be analogous to a universal Post
canonical system. The particular system chosen will depend on programming
considerations but will probably have a single rule of inference which
will combine substitution for variables with modus ponens. The purpose of
the combination is to avoid choking the machine with special cases of general
propositions already deduced.
3.\There is an %2immediate deduction routine%1 which when given a set of
premises will deduce a set of immediate conclusions. Initially, the
immediate deduction routine will simply write down all one-step consequences
of the premises. Later, this may be elaborated so that the routine will
produce some other conclusions which may be of interest. However, this
routine will not use semantic heuristics; i.e., heuristics which depend on the
subject matter under discussion.
\The intelligence, if any, of the advice taker will not be embodied in the
immediate deduction routine. This intelligence will be embodied in the
procedures which choose the lists of premises to which the immediate
deduction routine is to be applied. Of course, the program should never
attempt to apply the immediate deduction routine simultaneously to the
list of everything it knows. This would make the deduction routine take
too long.
4.\Not all expressions are interpreted by the system as declarative sentences.
Some are the names of entities of various kinds. Certain formulas
represent %2objects%1. For our purposes, an entity is an object if we
have something to say about it other than the things which may be deduced
from the form of its name. For example, to most people, the number 3812
is not an object: they have nothing to say about it except what can be
deduced from its structure. On the other hand, to most Americans the
number l776 is an object because they have filed somewhere the fact that it
represents the year when the American Revolution started. In the %2advice
taker%1 each object has a %2property list%1 in which are listed the specific
things we have to say about it. Some things which can be deduced from the
name of the object may be included in the property list anyhow if the
deduction was actually carried out and was difficult enough so that the
system does not want to carry it out again.
5.\Entities other than declarative sentences which can be represented by
formulas in the system are individuals, functions, and programs.
6.\The program is intended to operate cyclically as follows. The immediate
deduction routine is applied to a list of premises and a list of
individuals. Some of the conclusions have the form of imperative
sentences. These are obeyed. Included in the set of imperatives which
may be obeyed is the routine which deduces and obeys.
\We shall illustrate the way the %2advice taker%1 is supposed to act by
means of an example. Assume that I am seated at my desk at home and I
wish to go to the airport. My car is at my home also. The solution of
the problem is to walk to the car and drive the car to the airport. First,
we shall give a formal statement of the premises the %2advice taker%1 uses
to draw the conclusions. Then we shall discuss the heuristics which cause
the %2advice taker%1 to assemble these premises from the totality of facts
it has available. The premises come in groups, and we shall explain the
interpretation of each group.
\1.\First, we have a predicate "%2at%1". "%2at(x,y)%1" is a formalization
of "%2x is at y%1". Under this heading we have the premises
\\1.\%2at (I, desk)%1
\\2.\%2at (desk,home)%1
\\3.\%2at (car, home)%1
\\4.\%2at (home,county)%1
\\5.\%2at (airport, county)%1
\We shall need the fact that the relation"%2at%1" is transitive which might
be written directly as
\\6.\%2at(x,y), at(y,z) → at(x,z)%1
\or alternatively we might instead use the more abstract premises
\\6'.\%2transitive (at)%1
\\7'.\%2transitive (u) → (u(x,y), u(y,z) → u(x.z))%1
\from which 6. can be deduced.
\2.\There are two rules concerning the feasibility of walking and driving.
\\8.\%2walkable(x), at(y,x), at(z,x), at(I,y) → can(go(y,z, walking))%1
\\9.\%2drivable(x), at(y,x), at (z,x), at(car,y), at(I,car → can(go(y,z,
driving))%1
\There are also two specific facts
\\10.\%2walkable (home)%1
\\11.\%2drivable (county)%1
\3.\Next we have a rule concerned with the properties of going.
\\12.\%2did(go(x,y,z)) → at(I,y)%1
\4.\The problem itself is posed by the premise:
\\13.\%2want(at(I,airport))%1
\5.\The above are all the premises concerned with the particular problem.
The last group of premises are common to almost all problems of this
sort. They are:
\\14.\%2(x → can(y)), (did(y) → z) → canachult[x,y,z)%1
\The predicate %2"canachult(x,y,z)"%1 means that in a situation to which
%2x%1 applies, the action %2y%1 can be performed and ultimately
brings about a situation to
to which %2z%1 applies. A sort of transitivity is described by
\\15.\%2canachult(x,y,z), canachult(z,u,v) → canachult(x,prog(y,u),v).%1
\Here %2prog(u,v)%1 is the program of first carrying out %2u%1 and then %2v%1.
(Some kind of identification of a single action %2u%1 with the one step
program %2prog(u)%1 is obviously required, but the details of how this will
fit into the formalism have not yet been worked out).
\The final premise is the one which causes action to be taken.
\\16.\%2x,canachult(x,prog(y,z),w), want(w) → do(y)%1
\The argument the %2advice taker%1 must produce in order to solve the
problem deduces the following propositions in more or less the following
order:
\\1.\%2at(I,desk) → can(go(desk,car,walking))%1
\\2.\%2at(I,car) → can(go(home,airport,driving))%1
\\3.\%2did(go(desk,car,walking)) → at(I,car)%1
\\4.\%2did(go(home,airport,driving)) → at(I,airport)%1
\\5.\%2canachult(at,(I,desk), go(desk,car,walking, at(I,car))%1
\\6.\%2canachult(at(I,car), go(home,airport,driving), at(I,airport))%1
\\7.\%2canachult(at(I,desk), prog(go(desk,car,walking), go(home,airport,
driving)) → at(I,airport))%1
\\8.\%2do(go(desk,car,walking))%1
\The deduction of the last proposition initiates action.
.END
The above proposed reasoning raises two major questions of
heuristic. The first is that of how the l6 premises are collected, and
the second is that of how the deduction proceeds once they are found. We
cannot give complete answers to either question in the present paper; they
are obviously not completely separate since some of the deductions might
be made before some of the premises are collected. Let us first consider
the question of where the l6 premises came from.
First of all, we assert that except for the 13th premise
%2want(at(I,airport))%1, which sets the goal, and the 1st premise
%2(at(I,desk)%1 which we shall get from a routine which answers the
question %2"where am I")%1, %2all the premises can reasonably be expected
to be specifically present in the memory%1 of a machine which has
competence of human order in finding its way around. That is, none of
them are so specific to the problem at hand that assuming their presence
in memory constitutes an anticipation of this particular problem or of a
class of problems narrower than those which any human can expect to have
previously solved. We must impose this requirement if we are to be able
to say that the %2advice taker%1 exhibits %2common sense%1.
On the other hand, while we may reasonably assume that the
premises are in memory, we still have to describe how they are assembled
into a list by themselves to which the deduction routine may be applied.
Tentatively, we expect the %2advice taker%1 to proceed as follows:
initially, the sentence %2"want(at(I,airport))"%1 is on a certain list
%2L%1, called the main list, all by itself. The program begins with an
observation routine which looks at the main list and puts certain
statements about the contents of this list on a list called "observations
of the main list". We shall not specify at present what all the possible
outputs of this observation routine are but merely say that in this case
it will observe that "the only statement on %2L%1 has the form
%2'want(u(x))'."%1 (We write this out in English because we have not yet
settled on a formalism for representing statements of this kind). The
"deduce and obey" routine is then applied to the combination of the
"observations of the main list" list, and a list called the "standing
orders list". This list is rather small and is never changed, or at least
is only changed in major changes of the advice taker. The contents of the
"standing orders" list has not been worked out, but what must be deduced
is the extraction of certain statements from property lists. Namely, the
program first looks at %2"want(at(I,airport))"%1 and attempts to copy the
statements on its property list. Let us assume that it fails in this
attempt because %2"want(at(I,airport))"%1 does not have the status of an
object and hence has no property list. (One might expect that if the
problem of going to the airport has arisen before, %2"want(at(I,
airport))"%1 would be an object, but this might depend on whether there
were routines for generalizing previous experience that would allow
something of general use to be filed under that heading). Next in order
of increasing generality the machine would see if anything were filed
under %2"want(at(I,x))"%1 which would deal with the general problem of
getting somewhere. One would expect that premises 6, (or 6' and 7'), 8,
9, 12, would be so filed. There would also be the formula
.SKIP
.ONCE CENTER
%2want(at(I,x)) → do(observe(where am I))%1
which would give us premise 1. There would also be a reference to the next
higher level of abstraction in the goal statement which would cause a look
at the property list of %2"want(x)"%1. This would give us 14, 15, and 16.
We shall not try to follow the solution further except to remark
that on the property list of %2"want(at(I,x))"%1 there would be a rule
that starts with the premises %2at(I,y)"%1 and %2"want(I,x)"%1 and has as
conclusion a search for the property list of %2"go(y,x,z)"%1. This would
presumably fail, and then there would have to be heuristics that would
initiate a search for a %2y%1 such that %2"at(I,y)"%1 and
%2"at(airport,yy)"%1. This would be done by looking on the property lists
of the origin and the destination and working up. Then premise 9 would be
found which has as one of its premises %2at(I,car)%1. A repetition of the
above would find premise 8, which would complete the set of premises since
the other %2"at"%1 premises would have been found as by-products of
previous searches.
We hope that the presence of the heuristic rules mentioned on the
property lists where we have put them will seem plausible to the reader.
It should be noticed that on the higher level of abstraction many of the
statements are of the stimulus-response form. One might conjecture that
division in man between conscious and unconscious thought occurs at the
boundary between stimulus-response heuristics which do not have to be
reasonsed about but only obeyed, and the others which have to serve as
premises in deductions.
We hope to formalize the heuristics in another paper before we
start programming the system.
. SKIP 4
.BEGIN CENTER
%2DISCUSSION OF THE PAPER BY DR. J. MCCARTHY%1
.END
PROF. Y. BAR-HILLEL: Dr. McCarthy's paper belongs in the Journal of Half-Baked
Ideas, the creation of which was recently proposed by Dr. I. J. Good.
Dr. McCarthy will probably be the first to admit this. Before he goes on to
bake his ideas fully, it might be well to give him some advice and raise some
objections. He himself mentions some possible objections, but I do not think
that he treats them with the full consideration they deserve; there are others
he does not mention.
For lack of time, I shall not go into the first part of his paper,
although I think that it contains a lot of highly unclear philosophical,
or pseudo-philosophical assumptions. I shall rather spend my time in
commenting on the example he works out in his paper at some length.
Before I start, let me voice my protest against the general assumption of
Dr. McCarthy - slightly caricatured - that a machine, if only its
program is specified with a sufficient degree of carelessness, will be
able to carry out satisfactory even rather difficult tasks.
Consider the assumption that the relation he designates by %2at%1
is transitive (page 81). However, since he takes both %2"at(I,desk)"%1
and %2"at(desk,home)"%1 as premises, I presume - though this is never made
quite clear - that %2"at"%1 means something like
being-a-physical-part-or-in-the-immediate-spatial- neighborhood-of. But
then the relation is clearly not transitive. If A is in the immediate
spatial neighborhood of B and B in the immediate spatial neighborhood of C
then A need not be in the immediate spatial neighborhood of C. Otherwise,
everything would turn out to be in the immediate spatial neighborhood of
everything, which is surely not Dr. McCarthy's intention. Of course,
starting from false premises, one can still arrive at right conclusions.
We do such things quite often, and a machine could do it. But it would
probably be bad advice to allow a machine to do such things consistently.
Many of the other 23 steps in Dr. McCarthy's argument are equally
or more questionable, but I don't think we should spend our time showing
this in detail. My major question is the following: On page 83 McCarthy
states that a machine which has a competence of human order in finding its
way around will have almost all the premises of the argument stored in its
memory. I am at a complete loss to understand the point of this remark.
If Dr. McCarthy wants to say no more than that a machine, in order to
behave like a human being, must have the knowledge of a human being, then
this is surely not a very important remark to make. But if not, what was
the intention of this remark?
The decisive question how a machine, even assuming that it will
have somehow countless millions of facts stored in its memory, will be
able to pick out those facts which will serve as premises for its
deduction is promised to receive its treatment in another paper, which is
quite right for a half-baked idea.
It sounds rather incredible that the machine could have arrived at
its conclusion - which, in plain English, is "Walk from your desk to your
car!" - by sound deduction. This conclusion surely could not possibly
follow from the premise in any serious sense. Might it not be
occasionally cheaper to call a taxi and have it take you over to the
airport? Couldn't you decide to cancel your flight or to do a hundred
other things? I don't think it would be wise to develop a programming
language so powerful as to make a machine arrive at the conclusion Dr.
McCarthy apparently intends it to make.
Let me also point out that in the example the time factor has
never been mentioned, probably for the sake of simplicity. But clearly
this factor is here so important that it could not possibly be disregarded
without distorting the whole argument. Does not the solution depend,
among thousands of other things, also upon the time of my being at my
desk,, the time at which I have to be at the airport, the distance from
the airport, the speed of my car, etc.
To make the argument deductively sound, its complexity will have
to be increased by many orders of magnitude. So long as this is not
realized, any discussions of machines able to perform the deductive - and
inductive! - operations necessary for treating problems of the kind
brought forward by Dr. McCarthy is totally pointless. The gap between Dr.
McCarthy's general programme (with which I have little quarrel, after
discounting its "philosophical" features) and its execution even in such a
simple case as the one discussed seems to me so enormous that much more
has to be done to persuade me that even the first step in bridging this
gap has already been taken.
DR. MCCARTHY (in reply): Prof. Bar-Hillel has correctly observed that my
paper is based on unstated philosophical assumptions although what he
means by "pseudo-philosophical" is unclear. Whenever we program a
computer to learn from experience we build into the program a sort of
epistemology. It might be argued that this epistemology should be made
explicit before one writes the programme, but epistemology is in a foggier
state than computer programming even in the present half-baked state of
the latter. I hope that once we have succeeded in making computer
programs reason about the world, we will be able to reformulate
epistemology as a branch of applied mathematics no more mysterious or
controversial than physics.
On re-reading my paper I can't see how Prof. Bar-Hillel could see
in it a proposal to specify a computer program carelessly. Since other
people have proposed this as a device for achieving "creativity", I can
only conclude that he has some other paper in mind.
In his criticism of my use of the symbol "at", Prof. Bar-Hillel
seems to have misunderstood the intent of the example. First of all, I
was not trying to formalize the sentence form, A is at B, as it is used in
English. %2"at"%1 merely was intended to serve as a convenient mnemonic
for the relation between a place and a sub-place. Second, I was not
proposing a practical problem for the program to solve but rather an
example intended to allow us to think about the kinds of reasoning
involved and how a machine may be made to perform them.
Prof. Bar-Hillel`s major point concerns my statement that the premises listed
could be assumed to be in memory. The intention of this statement is to explain
why I have not included formalizations of statements like, "it is possible to
drive from my home to the airport" among my premises. If there were %2n%1 known
places in the county there would be
%2n(n-1)/2%1
such sentences and, since we are
quite sure that we do not have each of them in our memories, it would be cheating
to allow the machine to start with them.
The rest of Prof. Bar-Hillel`s criticisms concern ways in which
the model mentioned does not reflect the real world; I have already
explained that this was not my intention. He is certainly right that the
complexity of the model will have to be increased for it to deal with
practical problems. What we disagree on is my contention that the
conceptual difficulties arise at the present level of complexity and that
solving them will allow us to increase the complexity of the model easily.
With regard to the discussion between Prof. Bar-Hillel and Oliver
Selfridge - the logic is intended to be faultless although its premises
cannot be guaranteed. The intended conclusion is
%2"do(go(desk,are,walking))"%1 not, of course, %2"at(I,airport)"%1. The
model oversimplifies but is not intended to oversimplify to the extent of
allowing one to deduce one's way to the airport.